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ABSTRACT 

In the real world a graph is often fragmented and distributed 
across different sites. This highlights the need for evaluat- 
ing queries on distributed graphs. This paper proposes dis- 
tributed evaluation algorithms for three classes of queries: 
reachability for determining whether one node can reach an- 
other, bounded reachability for deciding whether there exists 
a path of a bounded length between a pair of nodes, and 
regular reachability for checking whether there exists a path 
connecting two nodes such that the node labels on the path 
form a string in a given regular expression. We develop these 
algorithms based on partial evaluation, to explore parallel 
computation. When evaluating a query Q on a distributed 
graph G, we show that these algorithms possess the follow- 
ing performance guarantees, no matter how G is fragmented 
and distributed: (1) each site is visited only once; (2) the 
total network traffic is determined by the size of Q and the 
fragmentation of G, independent of the size of G; and (3) 
the response time is decided by the largest fragment of G 
rather than the entire G. In addition, we show that these 
algorithms can be readily implemented in the MapReduce 
framework. Using synthetic and real-life data, we experi- 
mentally verify that these algorithms are scalable on large 
graphs, regardless of how the graphs are distributed. 

1. INTRODUCTION 

Large real-life graphs are often fragmented and stored dis- 
tributively in different sites, e.g., social networks [27], Web 
services networks [23] and RDF graphs [16,26]. For instance, 
a graph representing a social network may be distributed 
across different servers and data centers for performance, 
management or data privacy reasons [12, 23, 25, 27] (e.g., 
social graphs of Twitter and Facebook are geo-distributed 
to different data centers [12,25]). Moreover, various data 
of people (e.g., friends, products, companies) are typically 
found in different social networks [27] , and have to be taken 
together when one needs to find the complete information 
about a person. With this comes the need for effective tech- 
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Figure 1: Querying a distributed social network 

niques to query distributed graphs, for e.g., computing rec- 
ommendations [17] and social network aggregations [27]. 

There have been a number of algorithms and distributed 
graph database systems for evaluating queries on distributed 
graphs (e.g., [3,6,11,29,30]). However, few of these algo- 
rithms and systems provide performance guarantees, on the 
number of visits to each site, network traffic (data shipment) 
or computational cost (response time). The need for devel- 
oping efficient distributed evaluation algorithms with per- 
formance guarantees is particularly evident for reachability 
queries, which are most commonly used in practice. 

This paper advocates to evaluate queries on distributed 
graphs based on partial evaluation. Partial evaluation (a.k.a. 
program specialization) has been proved useful in a variety 
of areas including compiler generation, code optimization 
and dataflow evaluation (see [18] for a survey). Intuitively, 
given a function f(s, d) and part of its input s, partial evalu- 
ation is to specialize f(s, d) with respect to the known input 
s. That is, it conducts the part of f's computation that 
depends only on s, and generates a partial answer, i.e., a 
residual function /' that depends on the as yet unavailable 
input d. This idea can be naturally applied to distributed 
query evaluation. Indeed, consider a query posed on a graph 
G that is partitioned into fragments (Fi, . . . , F„), where Fi 
is stored in site Si. To compute Q(G), each site Si can find 
the partial answer to Q in fragment Fi in parallel, by taking 
Fi as the known input s while treating the fragments in the 
other sites as yet unavailable input d. These partial answers 
are collected and combined by a coordinator site, to derive 
the answer to query Q in the entire G. 

Example 1: Figure 1 depicts a fraction G of a recommenda- 
tion network, where each node denotes a person with name 
and job titles (e.g., database researcher (DB), human re- 
source (HR)), and each directed edge indicates a recommen- 
dation. The graph G is geo-distributed to three data centers 
DCi, DC2 and DC3, each storing a fragment of G. 

Consider a query Q given in Fig. 1, posed at DCi. It is to 
find whether there exists a chain of recommendations from 
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a CTO Ann to her finance analyst (FA) Mark, through either a 
list of DB people or a list of HR people. Observe that such a 
path exists: (Ann, CTO) -> (Walt, HR) -> (Mat, HR) (Fred, 
HR) -> (Emmy, HR) -» (Ross, HR) -> (Mark, FA). However, 
it is nontrivial to verify this in the distributed setting. A 
naive method is to first ship data from DCi, DC2 and DC3 
to a single site, and then evaluate the query using an al- 
gorithm developed for centralized data (i.e., graphs stored 
in a single site). This is infeasible because its data ship- 
ment may be prohibitively expensive and worse still, may 
not even be allowed for data privacy. Another way is to use 
a distributed graph traversal algorithm, by sending messages 
between different sites. This, however, requires messages to 
be sent along DCi — > DC 2 — » DCi — > DC 2 — ► DC 3 — > DCi, 
incurring unbounded number of visits to each site, excessive 
communication cost, and unnecessary delay in response. 

We can do better by using partial evaluation. We send 
the query Q to DCi, DC2 and DC3, as is. We compute the 
partial answers to (sub-queries of) Q at each site, in parallel, 
by taking the fragment residing in the site as known input 
and introducing Boolean variables to indicate unknown in- 
put (i.e., fragments in the other sites). The partial answers 
are vectors of Boolean formulas, one associated with each 
node that has an edge from a fragment stored at another 
site. These Boolean formulas indicate (1) at DCi, from Ann 
there exist an HR path to Walt and a DB path to Bill, and 
from Fred there is an HR path to Emmy; (2) at DC2, there 
exist an HR path from Emmy to Ross, an HR path from Mat 
to Fred; and (3) at DC3, there exists an HR path from Ross to 
Mark. These partial answers are collected by a coordinator 
site (DCi), which solves a system of equations formed by 
these Boolean formulas that are recursively defined, to find 
the truth values of those Boolean variables. It yields answer 
true to Q, i.e., there exists an HR path from Ann to Mark. 

We will show that this method guarantees the following: 
(1) each site is visited only once; (2) besides the query Q, 
only 2 messages are sent, all to the coordinator, and each 
message is independent of the size of G, and (3) partial eval- 
uation is conducted in parallel at each site, without waiting 
for the outcome or messages from any other site. □ 

While there has been work on query answering via par- 
tial evaluation [2,3,6,11], the previous work has focused 
on either trees [2,3,6] or non-recursive queries expressed in 
first-order logic (FO) [11]. We are not aware of any pre- 
vious algorithms based on partial evaluation for answering 
reachability queries, which are beyond FO, on possibly cyclic 
graphs that are arbitrarily fragmented and distributed. 

Contributions. We provide distributed evaluation algo- 
rithms for three classes of reachability queries commonly 
used in practice, via partial evaluation. We show that these 
algorithms posses several salient performance guarantees. 

(1) Our first algorithm is developed for reachability queries 
(Section 3), to decide whether two given nodes are con- 
nected by a path [31]. We show that when evaluating such 
a query on a distributed graph G, the algorithm (a) visits 
each site only once, (b) is in 0(|V/||.Fm|) time, and (c) its 
total amount of data shipped is bounded by 0(|V/| 2 ), where 
I Vf I is the number of nodes that have edges across different 
sites, and \F m \ is the size of the largest fragment in G. 

(2) Our second algorithm is for evaluating bounded reacha- 
bility queries (Section 4), for determining whether two given 
nodes are connected by a path of a bounded length [31]. We 



show that this algorithm has the same performance guaran- 
tees as its counterpart for reachability queries. 

(3) Our third algorithm is to evaluate regular reachability 
queries (Section 5), to decide whether there exists a path be- 
tween a pair (u, v) of nodes such that the node labels on the 
path satisfy a regular expression R. When evaluating such 
a query on a distributed graph G, the algorithm (a) visits 
each site only once, (b) is in 0(\F m \\R\ 2 + \R\ 2 \Vf\ 2 ) time, 
and (c) has network traffic bounded by (|i?| 2 |V/| 2 ), where 
\F m \ and \Vf\ are as above, and \R\ is the size of regular 
expression R, which is much smaller than \Vf\ and |fm|- 

(4) We also develop a MapReduce [7] algorithm for evaluat- 
ing regular reachability queries (Section 6). This shows that 
partial evaluation can be readily implemented in the widely 
used MapReduce framework. The algorithm can be easily 
adapted to evaluate (bounded) reachability queries, which 
are special cases of regular reachability queries. 

(5) We experimentally evaluate the efficiency and scalability 
of our algorithms(Section 7). We find that our algorithms 
scale well with both the size of graphs and the number of 
fragments. For instance, it takes 16 seconds to answer a 
regular reachability query on graphs with 1.5M (million) 
nodes and 2.1M edges, partitioned into 10 fragments. We 
also find that the communication cost of our algorithms is 
low. Indeed, the amount of data shipped by our algorithms 
is no more than 11% of the graphs in average. For reacha- 
bility queries on real-life graphs, our algorithms take only 
6% of running time of the algorithms based on message 
passing [21], and visit each site only once as opposed to 625 
visits in average by its counterpart [21]. In addition, our 
MapReduce algorithm is efficient. 

We contend that partial evaluation yields a promising ap- 
proach to evaluating queries on distributed graphs. It guar- 
antees that (1) the number of visits to each site is min- 
imum; (2) the total network traffic is independent of the 
size of the entire graph; (3) the evaluation is conducted in 
parallel, and its cost depends on the largest fragment of a 
partitioned graph and the number of nodes with edges to 
different sites, rather than the entire graph; and (4) it im- 
poses no constraints on how the graph is fragmented and 
distributed. Moreover, it can be readily implemented in the 
MapReduce model, as verified in our experimental study. 

Related Work. We categorize related work as follows. 

Distributed databases . A variety of distributed database 
systems have been developed. (1) Distributed relational 
databases (see [24]) can store graphs in distributed rela- 
tional tables, but do not support efficient graph query eval- 
uation [8, 9]. (2) Non-relational distributed data storage 
manage distributed data via various data structures, e.g., 
sorted map [4], key /value pairs [8]. These systems are built 
forprimary-key only operations [8,9], or simple graph queries 
(e.g., degree, neighborhood) 1 , but do not efficiently sup- 
port distributed reachability queries. (3) Distributed graph 
databases. N604J 1 is a graph database optimized for graph 
traversal. Trinity 2 and HyperGraphDB 3 are distributed sys- 
tems based on hypergraphs. Unfortunately, they do not sup- 
port efficient distributed (regular) reachability queries. 



1 http:/ fneo4j.org/ 

2 http://research.microsoft.com/en-us/projects/trinity/ 

3 http://www.kobrix.com/hgdb.jsp 
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Closer to our work is Pregel [21], a distributed graph 
querying system based on message passing It partitions a 
graph into clusters, and selects a master machine to assign 
each part to a slave machine. A graph algorithm allows (a) 
the nodes in each slave machine to send messages to each 
other, and (b) the master machine to communicate with 
slave machines. Several algorithms (distance, etc.) sup- 
ported by Pregel are addressed in [21]. Similar message- 
sending approaches are also developed in [13]. These algo- 
rithms differ from ours as follows, (a) In contrast to our 
algorithms, the message passing model in Pregel may seri- 
alize operations that can be conducted in parallel, and have 
no bound on the number of visits to each site, as shown by 
our experimental study (Section 7). (b) How to support reg- 
ular reachability query is not studied in [21]. On the other 
hand, the techniques of Pregel can be combined with partial 
evaluation to support local processing of reachability queries 
at each site (see Section 3). 

Distributed graph query evaluation. Several algorithms have 
been developed for evaluating queries on distributed graphs 
(see [19] for a survey). (1) Querying distributed trees [2,3,6]. 
Partial evaluation is used to evaluate XPath queries on dis- 
tributed XML data modeled as trees [3,6], as well as for 
evaluating regular path queries [2]. It is nontrivial, however, 
to extend these algorithms to deal with (possibly cyclic) 
graphs. Indeed, the network traffic of [3,6] is bounded by the 
number of fragments and the size of the query, in contrast to 
the number of nodes with edges to different fragments in our 
setting. Moreover, we study (regular) reachability queries, 
which are quite different from XPath. Finally, our algo- 
rithms only visit each site once, while in [2] each site may 
be visited multiple times. (2) Querying distributed semi- 
structured data [13,28-30]. Techniques for evaluating regu- 
lar path queries on distributed, edge-labeled, rooted graphs 
are studied in [30] and extended in [29], based on message 
passing. It is guaranteed that the total network traffic is 
bounded by n 2 , where n is the number of edges across dif- 
ferent sites. A distributed BFS algorithm is given in [28], 
which takes nearly cubic time in graph size, and a table of 
exponential size to achieve a linear time complexity, and is 
impractical for large graphs. These differ from our algo- 
rithms as follows, (a) Our algorithms guarantee that each 
site is visited only once, as opposed to twice [30]. (b) As 
remarked earlier, message passing may unnecessarily serial- 
ize operations, while our algorithms explore parallelism via 
partial evaluation. While an analysis of computational cost 
is not given in [29,30], We show experimentally that our 
algorithms outperform theirs (Section 7). 

There has also been recent work on evaluating SPARQL 
queries on distributed RDF graphs [11], which is not appli- 
caple to our setting due to (a) no performance guarantees or 
complexity bounds are provided in [11], and (b) the queries 
considered in [11] are expressible in FO, while we study (reg- 
ular) reachability queries beyond FO. 

2. DISTRIBUTED GRAPHS AND QUERIES 

We start with distributed graphs (Section 2.1), reachabil- 
ity queries and a partial evaluation framework (Section 2.2). 

2.1 Distributed Graphs 

We start with basic notations of graphs. We consider 
node-labeled, directed graphs, simply referred to as graphs. 



Graphs. A graph G = (V, E, L) consists of (1) a finite set 
V of nodes; (2) a set of edges E C V x V , where (v, w) £ E 
denotes a directed edge from node v to w; and (3) a function 
L defined on V such that for each node v in V, L(v) is 
a label from a set £ of labels. Intuitively, L() specifies 
node attributes, e.g., names, keywords, social roles, ratings, 
companies [20]; the set £ specifies all such attributes. 
We will use the following notations. 

(1) A path p from node v to w in G is a sequence of nodes 
(v — vo, vi, . . . , v n — w) such that for every i £ [l,n], 
(vi-i,Vi) £ E. The length of path p, denoted by len(p), 
is the number of edges in p. We define the label of p to be 
the list of the labels of vi, . . . ,v n -i, excluding vo and v n . 
Abusing notations of trees, we refer to i>; as a child of Vi-i, 
and Vj as a descendant of Vi for i,j £ [0, n] and i < j. 

We say that a node v can reach w if and only if (iff) there 
is a path from v to w. The distance from v and w, denoted 
by dist(u, v), is the length of the shortest paths from v to w. 

(2) A node induced subgraph G 3 of G is a graph (V 3 , E 3 , L 3 ), 
where (a) V 3 C V, (b) there is an edge (u, v) £ E 3 iff u, v £ 
V 3 and (u, v) £ E, and (c) for each v £ V s , L 3 (v) — L(v). 

Distributed Graphs. In practice a graph G is often par- 
titioned and stored in different sites [16,27]. We define a 
fragmentation J- of a graph G = (V, E, L) as a pair (F, G/), 
where F is a collection of subgraphs of G, and G/ is called 
the fragment graph of T , specifying edges across distinct 
sites. More specifically, F and G/ are defined as follows. 

(1) F — (Fi,...,Fk), where each fragment Fi is specified 
by (Vi U Fi.O, Ei U cEi, L,) such that (a) (Vi, . . . , V k ) is 
a partition of V, (b) each (Vi, Ei, Li) is a subgraph of G 
induced by Vi, (c) for each node u £ Vi, if there exists an 
edge (u, v) £ E, where v is in another fragment, then there 
is a virtual node v in Fi.O, and (d) cEi consists of all and 
only those edges (u, v) such that u £ Vi and v is a virtual 
node, referred to as cross edges. We also use Fi.I to denote 
the set of in-nodes of Fi, i.e., those nodes u £ V% such that 
there exists a cross edge (v, u) incoming from a node v in 
another fragment Fj to it, i.e., v is a virtual node in Fj. 

Intuitively, Vi U Fi.O of Fi consists of (a) those nodes in 
Vi and (b) for each node in Vi that has an edge to another 
fragment, a virtual node indicating the connection. The 
edge set EiUcEi consists of (a) the edges in Ei and (b) cross 
edges in cEi, i.e., edges to other fragments. In a distributed 
social graph, for instance, cross edges are indicated by either 
IRIs (universal unique IDs) or semantic labels of the virtual 
nodes [21,27]. We also identify Fi.I, a subset of nodes in Vi 
to which there are incoming edges from another fragment. 

We assume ui.l.o.g. that each Fi is stored at site Si. 

(2) The fragment graph G/ is defined as (Vf, Ef), where Vj 
= U ie[ i, fc ](^-OUF l .J) and E f = U ie [i, fc ] cE i- Here i^O U 
Fi.I includes all the nodes in Fi that have cross edges to or 
from fragment Fi. These nodes can be grouped together, 
denoted by a single "hyper-node", indicating Fi. The set 
Ef collects all the cross edges from all fragments. 

Example 2: Figure 1 depicts a fragmentation T of graph 
G, consisting of three fragments F\,Fi,Fo, stored in sites 
DCi, DC2 and DC3, respectively. For fragment F\, Fi.O 
consists of virtual nodes Pat, Mat and Emmy, Fi.I includes 
in-nodes Fred, and its cE set consists of cross edges (Fred, 
Emmy), (Bill, Pat) and (Walt, Mat), i.e., all the edges from F\ 
outgoing to another fragment; similarly for F2 and F3. In 
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Figure 2: Fragment graph and partial evaluation 

particular, edges (Mat, Fred) and (Bill, Pat) are cross edges 
from fragments F2 to Fi and Fi to F3, respectively. 

The fragment graph Gf of T is shown in Fig. 2, which 
collects all in-nodes, virtual nodes and cross edges, but does 
not contain any nodes and edges internal to a fragment. □ 

We remark that no constraints are imposed on fragmenta- 
tion, i.e., the graphs can be arbitrarily fragmented. Observe 
that multiple fragments may reside in a single site, and our 
algorithms can be easily adapted to accommodate this. 

2.2 Queries and Partial Evaluation 

Given a fragmentation T of graph G and a query Q, dis- 
tributed query evaluation is to compute the answer to Q in 
G, using data in T . It aims to minimize (1) the number of 
visits to each site, (2) the network traffic (communication 
cost), i.e., the total amount of data shipped from one site 
to another, and (3) the response time (computational cost). 

We focus on three classes of graph queries in this work. 

(1) A reachability query q r (s, t) is to determine whether node 
s can reach another node t in G. 

(2) A bounded reachability query qt> r (s,i,/) is to decide 
whether dist(s,f) < / for a given integer (bound) I. 

(3) A regular reachability (path) query q, r (s, t, R) is to deter- 
mine whether there exists a path p from s to t such that p 
satisfies R. Here R is a regular expression: 

R ::= e I a I RR | RU R\ R* , 

where e is the empty string, a is a label in E, RR and RU R 
and R* denote alternation, concatenation and the Kleene 
closure, respectively. We say that a path p satisfies R if the 
label of p is a string in the regular language defined by R. 

Remark. Observe the following. (1) One can define a 
"wildcard" _, which matches any label, as 01 U . .. U a m , 
for all a^'s in E. Leveraging _, reachability and bounded 
reachability queries can be expressed as regular reachability 
(path) queries. We study these queries separately because 
(a) they admit lower complexity than regular reachability 
queries, and (b) in practice, it often suffices to use these 
simple queries [31], without paying the price of higher com- 
plexity of regular path queries. (2) It is known that it is np- 
complete to determine whether there exists a simple path p 
from s to t such that p satisfies a regular expression R [22]. 
Here we do not require p to be a simple path, i.e., we allow 
multiple occurrences of the same node on p, and develop a 
low polynomial time algorithm for regular path queries. 
Notations in this section are summarized in Table 1. 

Partial evaluation. Given a query Q and a fragmentation 
J- of a graph G, we compute Q(G), a Boolean value indicat- 
ing the reachability of Q in G. Assume that Q is posed on a 
site S c , referred to as a coordinator site, in which a mapping 
h from the fragments in T to different sites is stored. As 
shown in Fig. 2, we use partial evaluation to compute Q(G). 



symbols 


notations 


T= (F,G f ) 


graph fragmentation in which G t is the fragment graph 


Fi.I 


the set of in-nodes in a fragment Fi 


F z .O 


the set of virtual nodes in a fragment Fi 


q r (M) 


reachability query 


q br (s : t, I) 


bounded reachability query 


q„(s,t,R) 


regular reachability query 



Table 1: Notations: graphs and queries 



(1) Distributing at site S c . Upon receiving Q, the coordi- 
nating site S c posts Q to each fragment, as is, by using h. 

(2) Local evaluation at each site Si. Each site Si evaluates 
(sub-queries) of Q in parallel, by treating the fragment Fi 
stored in Si as the known input to Q; the other fragments Fj 
are taken as the yet unavailable input, denoted by Boolean 
variables associated with virtual nodes in Fi.O. The par- 
tial answers are represented as vectors of Boolean formulas 
associated with nodes in Fi.I, and are sent back to S c . 

(3) Assembling at S c . Site S c assembles these partial 

answers to get the final answer Q(G), by using Gf. 

Following this, the next three sections develop evaluation 
algorithms for (bounded, regular) reachability queries. 

3. DISTRIBUTED REACHABILITY 

We first develop distributed evaluation strategies for 
reachability queries. Given a reachability query q r (s, t) and a 
fragmentation T = (F, Gf) of a graph G, we decide whether 
s reaches t in G. The main result of this section is as follows. 

Theorem 1: Over a fragmentation T — (F, Gf) of a graph 
G, reachability queries can be evaluated (a) in 0(\Vf\\F m \) 
time, (b) by visiting each site only once, and (c) with the to- 
tal network traffic bounded by 0(\Vf\ 2 ), where Gf — (V/, Ef) 
and F m is the largest fragment in F . □ 

As a proof of the theorem, we provide an algorithm to 
evaluate reachability queries q r (s,t) over a fragmentation T 
of a graph G. The algorithm, denoted as disReach, is given 
in Fig. 3. As shown in Fig. 2, the algorithm evaluates q,(s, t) 
based on partial evaluation, in three steps as follows. 

(1) The coordinator site S c posts the same query q r (s,t) to 
each fragment in F (line 1). 

(2) Upon receiving q r (s,f), each site invokes procedure 
localEval to partially evaluate q r (s,t), in parallel (lines 3- 
4). This yields a partial answer F;.rvset from each fragment, 
which is a set of Boolean equations (as will be discussed 
shortly) and is sent back to the coordinator site S c . 

(3) The coordinator site S c collects F^.rvset from each site 
and assembles them into a system R Vset of Boolean equa- 
tions (line 3). It then invokes procedure evalDG to solve 
these equations and finds the final answer to q r (s,t) in G 
(line 5). In contrast to partial query evaluation on trees 
[2,3,6], the Boolean equations of RVset are possibly recur- 
sively defined since graph G may have a cyclic structure, 

We next present procedures localEval and evalDG, for pro- 
ducing and assembling partial answers, respectively. 

Partial evaluation. Procedure localEval evaluates q r (v,t) 
on each fragment Fj in parallel. For each in-node v in Fi, it 
decides whether v reaches t. Later on procedure evalDG will 
assemble such answers and find the final answer to q r (s,t). 

Let us consider how to compute q r (v,t). If t £ Fi and v 
can reach t, then q r (v,t) can be locally evaluated to be true. 
Otherwise, q r (v,t) is true iff there exists a virtual node v' of 
Fi such that both q,(v,v') and q r (v',t) are true. Indeed, in 
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Algorithm disReach /* executed at the coordinator site */ 
Input: Fragmentation (F.Gf), reachability query q r (s,t). 
Output: The Boolean answer ans to q r in G. 

1. post query q r (s, t) to all the fragments in F; 

2. RVset := 0; 

3. for each fragment Fi in F do 

4. RVset := RVset U localEval(Fi, q r (s, *)); 

5. ans := evalDG(RVset); 

6. return ans; 

Procedure localEval /* locally at each site in parallel */ 
Input: A fragment Fi, a reachability query q r (s,t). 
Output: (a set rvset of Boolean equations). 

1. Fi.rvset:= 0; iset:= Fj.7; oset:= Fi.O; 

2. if s G Fi then iset:= iset U {s}; 

3. if t G Fi then oset:= oset U {t}; 

4. for each node v G oset do 

5. if v = t then v.rf := true; 

6. else D.rf := X v ; 

7. for each node v 6 iset do 

8. for each node v' 6 oset do 

9. if v' G des(v, Fi) then v.rf := v.rf V v'.rf; 

10. Fi. rvset := F,. rvset U {X v = v.rf} 

11. send Fj. rvset to the coordinator site 5 C ; 

Figure 3: Algorithm disReach 

the latter case v can reach t if there exists a virtual node v' 
such that i/ can reach t. Observe that q r (v, v') can be locally 
evaluated in Fi, but not q r (v',t) since v' and i are in other 
fragments. Instead of waiting for the answer of q r (v' , t), we 
introduce Boolean variables, one for each virtual node v' in 
Fi.O, to denote the yet unknown answer to q r (v' ,t) in G. 
The answer to q,(v, t) is then a Boolean formula v.rf associ- 
ated with v, which is the disjunction of only the variables of 
those virtual nodes v' to which v can reach in Fi. 

More specifically, procedure localEval works as follows. It 
first initializes a set Fi. rvset of Boolean equations, and puts 
the in-nodes Fj.7 and virtual nodes Fi.O of F, in sets iset 
and oset, respectively (line 1). If s (rcsp. t) is in Fi, localEval 
includes s (resp. t) in iset (resp. oset) as well (lines 2-3). A 
Boolean variable X v is associated with each node v G oset U 
iset. For each virtual node v G oset, if v is t or v can reach t 
via a path in Fi, then X v is assigned true (lines 4-5). For each 
in-node v G iset, localEval locally checks whether v can reach 
a virtual node v' G oset (lines 8-9). If so, localEval updates 
v.rf, the Boolean formula of v, to be v.rf V v'.rf (line 10). 
Observe that if t is in des(v, Fi), then v.rf is evaluated to be 
true. Here v' G des(w,Fj) denotes that v' is a descendant of 
v in Fi; this can be checked using any available centralized 
algorithm for reachability queries [31], locally in Fj. After 
the formula of in-node v is constructed, Fi. rvset is extended 
by including a Boolean equation X v = v.rf. The set Fi. rvset 
is then sent to the coordinator site S c (line 11). 
Example 3: Consider a query q r (Ann, Mark) over G in Fig 1. 
Algorithm disReach at the coordinator site DCi first sends 
the query to each site, where a set of Boolean equations are 
computed, as shown below. 
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Observe that for each i G [1,3], each equation in Ft. rvset 
is of the form X v = \J X v i, where v is an in-node, and v' is 



Procedure evalDG /* executed at the coordinator site */ 
Input: A system RVset of Boolean equations. 
Output: The Boolean answer ans to q r (s,t). 

1. construct dependency graph Gd = (V^, Ed, Lj) from RVset; 

2. if there is no Vd G Vd such that L(vd) = {X v = true} 

then return false; 

3. else merge all such nodes into a node vtrue', 

4. if Htme G des(t> s , Gd) then return true; 

5. else return false; 

Figure 4: Procedure evalDG 

a virtual node that v can reach in Fi. In particular, Ross.rf 
= true since the node Ross can reach Mark in F3. □ 

Assembling. After the local evaluation, the equations col- 
lected in RVset at the coordinator site S c form a Boolean 
equation system (BES) [14]. It consists of equations of the 
form X v = v.rf, where v is an in-node in some fragment 
Fi, and Boolean variables in v.rf are associated with virtual 
nodes (out-nodes), which in turn are connected to in-nodes 
of some other fragments. In particular, RVset contains a 
Boolean equation X s = s.rf, where the truth value of X s is 
the final answer to q r (s,t). Given RVset, procedure evalDG 
is to compute the truth value of X B . Observe that equations 
in RVset may be defined recursively. For example, iFred in 
Example 3 is defined indirectly in terms of itself. 

Observe that RVset has 0(|V/|) Boolean equations. It is 
known that BES RVset can be solved in 0(\V f \ 2 ) time [14]. 
We next present such an algorithm, based on a notion of de- 
pendency graphs. The dependency graph of RVset is defined 
as Gd = (Vd, Ed, Ld), where Vd G Vd is a Boolean variable 
X v in RVset; Ld(vd) = V if = V ^« i s m RVset; and 
there is an edge (vd, v' d ) G Ed if and only if X' v is in \J X Vi of 
Ld(vd)- Note that the size |G<j| of Gd is in 0(|V/| 2 ), where 
Gf — (Vf, Ef) is the fragment graph of T. 

Based on this notion, we present procedure evalDG in 
Fig 4. It first constructs the dependency graph Gd of RVset 
(line 1). It groups into a single node v true all those nodes 
(variables) that are known to be true (line 3). It returns 
false if no such node exists, since no in-nodes can reach t in 
any of the fragment (line 2). Otherwise, it returns true if v s 
(indicating X, in X s = s.rf) can reach w t me (lines 4-5). 
Example 4: Consider the Boolean equations of Example 3. 
Given these, evalDG first builds its dependency graph, shown 
in Fig 5(a). It then checks whether there is a path from XA„n 
to X true (A^Mark)- It returns true as such a path exists. □ 

Correctness. One can easily verify the following: s can 
reach t in G iff there exist a positive integer I and a path 
(s, Xi, . . . , xi,t) such that ^.rf's are built in some fragment 
by localEval, and moreover, are evaluated to true by proce- 
dure evalDG. This can be shown by induction on I. 
Complexity. Algorithm disReach guarantees the following. 

The number of visits. Obviously each site is visited only 
once, when the coordinator site posts the input query. 

Total network traffic. For each fragment Ft, Fi. rvset has 
|Fj.J| equations, each of |Fj.O| bits indicating the presence 
or absence of variables in the Boolean formula. Hence the set 
RVset consists of at most | V/| equations, each of at most \Vf\ 
bits. The total network traffic is thus bounded by 0(|V/| 2 ), 
independent of \G\, since |q r (s,£)| is negligible. 

Computational cost. Observe the following. (1) Procedure 
localEval is performed on each fragment Fi in parallel, and 
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(a) Dependency graph (b) Weighted dependency graph 



Figure 5: Dependency graphs 

it takes 0(|fi||V/|) time to compute F^. rvset for each frag- 
ment (see the discussion below). Hence it takes at most 
0(|V/||-F m |) time to get F^. rvset from all sites, where F m is 
the largest fragment of T. (2) It takes procedure evalDG 
0(\Gd\) time to construct the dependency graph Gd, and to 
find whether v s reaches v true in Gd- Since \Gd\ is in 0(|V/| 2 ), 
and \ Vf\ is typically much smaller than |F m | in practice, the 
computational cost is bounded by 0(|_F m ||V7|)- That is, the 
response time is also independent of the entire graph G. 

To check whether a pair of nodes connect in a fragment or 
in G d , we use DFS/BFS search, and thus get the 0(|V/||F m |) 
(resp. 0(|V/| 2 )) complexity. In fact any indexing techniques 
{e.g., reachability matrix [31], 2-hop index [5]), parallel and 
graph partition strategies {e.g., Pregel [21]) developed for 
centralized graph query evaluation can be applied here, which 
will lead to lower computational cost. 

The analysis above completes the proof of Theorem 1. 

Remarks. In theory, one can compute the transitive clo- 
sure (TC) of a graph to decide whether a node can reach 
another. However, it is impractical to compute the TC over 
large graphs due to its time and space costs. Worse still, 
when the graphs are distributed, computing TC may incur 
excessive unnecessary data shipments. Indeed, we are not 
aware of any distributed algorithms that compute TC with 
performance guarantees on network traffic, even when index- 
ing structures are employed (see [31] for a survey on such in- 
dexes). In contrast, we show that in the distributed setting, 
partial evaluation promises performance guarantees. Also 
observe that in practice, the size of Vf is usually small [27] . 

4. DISTRIBUTED BOUNDED REACHA- 
BILITY QUERIES 

We next develop a distributed evaluation algorithm for 
bounded reachability queries q br (s, r, Z), to decide whether 
dist(s,£) < I. In contrast to reachability queries, to evaluate 
qbr(s, t, I) we need to keep track of the distances for all pairs 
of nodes involved. Nevertheless, we show that the algorithm 
has the same performance guarantees as algorithm disReach. 
Theorem 2: Over a fragmentation T = {F, Gf) of a graph 
G, bounded reachability queries can be evaluated with the 
same performance guarantees as for reachability queries. □ 

To prove Theorem 2, we outline an algorithm, denoted by 
disDist (not shown), for evaluating q b r(s, t, I) over a fragmen- 
tation T of a graph G. It is similar to algorithm disReach for 
reachability queries (Fig. 3), but it needs different strategies 
for partial evaluation at individual sites and for assembling 
partial answers at the coordinator site. These are carried 
out by procedures local Eva Id and evalDGd, respectively. 

Procedure localEvald. To evaluate q br (s,i, I), for each frag- 
ment Fi and each in-node v in Fi, we need to find dist(v,r), 
the distance from v to t. To do this, we find the minimum 
value of dist(w, v') + dist(i/, t) when v' ranges over all virtual 
nodes in Fi to which v can reach. We associate a variable 



X v i with each such v' to denote dist(u',t) {numeric value). 
We express the partial answer for v as a formula v.rf. 

Procedure localEvald is similar to local Eva I, but differs in 
that for each virtual node v, if v — t, it assigns to v.rf, 
and otherwise v.rf is X v . For each in-node v G iset and each 
virtual node v' G oset, localEvald locally finds the distance 
from v to v and uses a set st to collect formulas u'.rf + 
dist(f,t/) if dist(u,u') < /. The set F;. rvset with equations 
X v = min(u.st) is sent to the coordinator site S c . 

Procedure evalDGd . Given Fi.rvset from all the sites, pro- 
cedure evalDGd assembles these partial answers to find the 
answer to q br (s, i, Z) in G. As opposed to evalDG (Fig. 4), it 
builds an edge weighted graph Gd = {Vd, Ed, Ld, Wd), where 
(Vd, Ed, Ld) is a labeled dependency graph as defined be- 
fore; and the weight Wd{e) of e is d\st{vd, v' d ). Note that 
\V d \ < \V f \ and \E d \ < \V f \ 2 , where G f = {V f ,E f ) is the 
fragment graph of T . The procedure then uses algorithm 
Dijkstra [32] to compute the distance d from X s to Xt, in 
time 0{\E d \ + \V d \ log ] Vd\), where X s G Vd denotes the node 
s in q br (s, t, I). It returns true iff d < I. One can verify that 
dist(s, t) in G is equal to the distance from X s to X t in Gd- 



Example 5: Given query q br (Ann, Mark, 6) posed on graph 
G of Fig 1, disDist computes a set of equations of arithmetic 
formulas (not Boolean equations). The vectors for Fi are: 
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After rvset is received by coordinator DCi, procedure 
evalDGd first builds a weighted dependency graph Gd, shown 
in Fig 5(b). It then computes the shortest path from X& nn 
to XMark by applying Dijkstra to Gd- It returns true since the 
length of the path is 6, satisfying the distance bound. □ 

One can verify that algorithm disDist (1) visits each 
site only once, (2) its total network traffic is bounded by 
0(|V/| 2 ), and (3) it takes at most 0(|F m ||V/|) time, where 
F m is the largest fragment in T . Moreover, indexing tech- 
niques [31] can be incorporated into localEvald and evalDGd, 
to reduce the cost of local evaluation and hence, the response 
time {e.g., with constant time via a distance matrix). 

5. DISTRIBUTED REGULAR REACHA- 
BILITY QUERIES 

We now develop techniques to distributively evaluate reg- 
ular reachability queries. Given such a query q n {s,t,R) and 
a fragmentation T of graph G, it is to find whether there 
exists a path p from s to t in G such that p satisfies R. 
In contrast to (bounded) reachability queries, to evaluate 
q rr (s, t, R) we need to collect and transmit information about 
not only whether there are paths from a node to another, 
but also whether the paths satisfy the complex constraint 
imposed by R. The main result of this section is as follows. 

Theorem 3: On a fragmentation T — {F, Gf) of graph G, 
regular reachability queries q rr {s, t, R) can be evaluated (a) in 
0(|F m ||i?| 2 + |i?| 2 |V/| 2 ) time, (b) by visiting each site once, 
and (c) with the total network traffic in 0{\R\ 2 \Vf\) 2 ) , where 
Gj — {Vf, Ef) and F m is the largest fragment in F . □ 

To prove Theorem 3, we first introduce a notion of query 
automata (Section 5.1), and then present an evaluation al- 
gorithm based on query automata (Section 5.2). 
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Mark ©Mark 
Gq(R),R=(DB* U HR*) Gq'(R'),R'=(CTO DB*) U HR* 

Figure 6: Query automaton G q (R) 

5.1 Query Automaton 

To effectively check whether a path satisfies a regular ex- 
pression R, we represent R as a variation of nondeterministic 
finite state automata (NFA), referred to as query automaton. 

A query automaton G q (R) of q rr (s,t,R) accepts paths p 
that satisfy R. It is defined as <V q , E q , L q , u s , u t >, where 
(1) V q is a set of states, (2) E q C V q x V q is a set of transitions 
between the states, (3) L q is a function that assigns each 
state a label in R, and (4) u s and u t in V q are the start 
state and final state corresponding to s and t, respectively. 
In contrast to traditional NFA, at state u v , for each edge 
(v,v') on a path, a transition u v — > u' v can be made via 
(u v ,u' u ) e E q if L(v) = L q {u v ) and L(v') = L q (u' v ). The 
automaton can be constructed in 0(|i?|log(|_R|)) time, using 
a conversion similar to that of [15]. It is of linear size in \R\. 

We say that a state u is a child of u (resp. v! is a parent 
of it) if («', u) £ E q , i.e., u' can transit to u. 

Example 6: Recall q rr (Ann, Mark, R), the regular reachabil- 
ity query given in Example 1, where 7? = (DB* U HR*). Its 
query automaton G q (R) is depicted in Fig 6. The set V q 
has four states Ann, DB, HR, Mark, where the start and final 
states are Ann and Mark, respectively. The set E q of transi- 
tions is {(Ann,DB), (DB,DB), (DB,Mark), (Ann,HR), (HR,HR), 
(HR,Mark)}. In contrast to NFA, it is to accept paths in, e.g., 
G of Fig. 1, and its transitions are made by matching the 
labels of its states with the job labels on the paths (except 
the start and final states, which are labeled with name). 

As another example, consider query q rr (Walt, Mark, R'), 
where i?'=((CTO DB*) U HR*). Figure 6 shows its query 
automaton, which has 5 states and 7 transitions, with Walt 
and Mark as its start state and final state, respectively. □ 

We say that a node v in G is a match of a state u v in 
Gq(R) iff (1) L(v) — L q (u v ), and (2) there exist a path p 
from v to t and a path p' from u v to ut, such that p and p' 
have the same label. The lemma below shows the connection 
between q rr (s,t, R) and G q (R), which is easy to verify 

Lemma 4: Given a graph G, q„(s,t,R) over G is true if 
and only if s is a match of u s in G q (R). □ 

5.2 Distributed Query Evaluation Algorithm 

We next present an algorithm to evaluate regular reacha- 
bility queries over a fragmentation T of a graph G. The algo- 
rithm, denoted as disRPQ (not shown), evaluates q rr (s,t, R) 
based on partial evaluation in three steps, as follows. 

(1) It first constructs the query automaton G q (R) of 
q„(s,t,R) at site S c , and posts G q to each fragment in T. 

(2) Upon receiving G q {R), each site invokes procedure 
localEval r to compute a partial answer to q rr (s, t, R) by using 
G q , in parallel. The partial answer at each fragment Fi, de- 
noted as F^. rvset, is a set of vectors. Each entry in a vector 
is a Boolean formula (as will be discussed shortly). 

(3) The partial answer is sent back to the coordinator site 
S c . The site S c collects F;. rvset from each site and assembles 



Procedure localEval r /* executed locally at each site, in parallel */ 

Input: A fragment Fi, a query automaton G q (V q , E q , L q ,u a ,u t ). 
Output: Partial answer to q rr in Fi (a set rvset of vectors). 

1. Fi. rvset := 0; iset:= Fi.I; oset:= Fi.O; 

2. if s e Fi then iset:= iset U {s} /* s denoted by u a */ 

3. if t 6 Fi then oset:= oset U {<}; /* t denoted by u t */ 

4. for each node v £ Vi \ oset do v. visit := false; 

5. for each node v 6 oset do 

6. v. rvset := 0; 

7. for each node u £ V q do 

8. if v = t and u = ut then v.rvec[ut] := true; 

9. else if L(v) = L q (u) then n.rveckt] := X^ v ^ u y, 

10. else v.rvec[u] := false; 

11. v. visit := true; 

12. for each node v £ iset do 

13. D.rvec := cmpRvec(i>, Fi, q rr , G q ); 

14. Fi. rvset := Fj. rvset U u.rvec; 

15. send Fi. rvset to the coordinator site S c ; 

Procedure cmpRvec 

Input: A node v, a fragment Fi, and 

a query automaton G q (V q , E q , L q ,u a ,ut). 
Output: The vector f.rvec of v, consisting of Boolean formulas. 

1. if v. visit = true then return u.rvec; 

2. for each node v q £ V q do rvec[t> q ] := false; 

3. for each node w £ C(v, Fi) do 

4. if w. visit = false then 

5. ui.rvec := cmpRvec(to, Fi, q rr , G q (R)); 

6. for each node v q £ V q do 

7. if L(v) = Lq(v q ) then 

8. rvec[u<j] := rvec[w 9 ] V cmposeVec(i; q , w, ui.rvec, Gq(R)); 

9. v. visit := true; 

10. return rvec; 

Figure 7: Procedure localEvalr and cmpRvec 

them into a set RVset of vectors of Boolean formulas. It then 
invokes procedure evalDG r to solve these equations and find 
the final answer to q rr (s, t, R) in G. 

We now present procedures localEvalr and evalDG r . 

Local evaluation. We first formulate the partial answer 
w.rvec at each node v in a fragment Fi. It indicates whether 
v is a match of some state u in the query automaton G q , i.e., 
v reaches t and moreover, satisfies the constraints imposed 
by G q (Lemma 4). Hence we define «.rvec to be a vector 
of 0(|Vq|) entries, where V q is the set of states in G q . For 
each state u in V q , the entry n.rvec[u] is a Boolean formula 
indicating whether node v matches state u. In contrast to its 
counterparts for (bounded) reachability queries, here u.rvec 
is a vector of Boolean formulas, instead of a single formula. 

Observe that v matches a state u v if and only if (1) L(v) 
= L(u v ), and (2) either v is t, or there exists a child ioof» 
and a child u w of u v such that w matches u w . To cope with 
virtual nodes, for each w G Fi.O and each state u m £ V q , we 
introduce a Boolean variable X( WtUw ), denoting whether w 
matches u w . The vector of each in-node v in Fi.I consists 
of formulas defined in terms of these Boolean variables. 

Based on these, we give procedure localEval r in Fig. 7. 
It first initializes a set F;. rvset of vectors, and puts the in- 
nodes Fi.I and virtual nodes Fi.O of Fi in sets iset and 
oset, respectively (line 1). If s (resp. i) is in Fi, local Eva I 
includes s (resp. t) in iset (resp. oset) as well (lines 2-3). 
For each node v m Fi, it associates a flag v. visit to indicate 
whether w.rvec is already computed, and initializes it to be 
false if v is not in oset (line 4). It then initializes the vector 
w.rvec for each virtual node v of Fi (lines 5-11), as follows. 
If v = t, then w.rvec[Mt] is assigned true (line 8). Otherwise 
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for each state u in G q , if u and v have the same label, then 
w.rvec[wt] is a Boolean variable A"(„ :U ), indicating whether v 
matches u (line 9); if not, «.rvec[u] is false (line 10). Since 
u.rvec is initialized (lines 6-10), localEval sets v. visit to be 
true (line 11). Then for each in-node v, localEval r invokes 
procedure cmpRvec to partially compute the vector of v, and 
extends F.rvset with f.rvec (lines 12-14). After all in-nodes 
are processed, F.rvset is sent to site S c (line 15). 

Procedure cmpRvec computes the vector u.rvec for a node 
v, as follows. If v. visit is true, it returns v. rvec (line 1). Oth- 
erwise, it initializes a vector rvec (lines 2). The procedure 
then computes w.rvec following Lemma 4. For each child w 
of v, if w is not visited, then w.rvec is computed via a recur- 
sive call of cmpRvec (lines 3-5; here C(v, Fi) denotes the set 
of children of v in Fi). After w.rvec is known, for each state 
v q in Gd, cmpRvec checks if v and v q have the same label 
(lines 6-7); if so, it uses w.rvec[v' q ] to compute rvec[v g ] via 
procedure cmposeVec (line 8). After v.r\/ec[v q ] is computed, 
v. visit is set true (line 9) and v.rvec[i> ? ] is returned (line 10). 

Procedure cmposeVec (not shown) takes a state v q and a 
node w as input, and constructs a formula / using formulas 
in w.rvec. Initially / is false. For each child state v' q of v q , 
it checks whether w and v' q have the same label. If so, / is 
extended by taking w.r\/ec[v' q ] as a disjunct. The formula / 
is returned after all child states of v q is processed. 

Example 7: Given q rr (Ann, Mark, R), the query of Exam- 
ple 1 posed on the distributed graph G of Fig. 1, procedure 
localEval r evaluates the query on Fi as follows. For each 
virtual node of F2, it initializes its vector, e.g., the vector 
of Ross is (false, false, X( RosSjH r) , false), corresponding to the 
states (Ann, DB, Mark, HR) in query automaton G q (R) (see 
Fig. 6). It then invokes procedure cmpRvec to compute the 
vector of each in-node F2. For instance, consider in-node 
Emmy. Since (1) Emmy is an HR that matches state HR in 
G q , and (2) Emmy has a child Ross that may match state 
HR, the formula Emmy.[HR] is extended to A"( Ross H r) by pro- 
cedure cmposeVec. The final vectors for F2 are: 
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Assembling. Procedure evalDG r (not shown) collects the 
partial answers from all the sites into a set RVset, and as- 
semble them to compute the answer to q n (s,t,R) at the 
coordinator site S c . It is similar to procedure evalDG given 
in Fig. 4, except that it uses a different notion of dependency 
graphs. Here the dependency graph Gd of RVset is defined 
as (Vd, Ed,Ld), where (a) for each in-node v and each entry 
u of its vector u.rvec in RVset, there is a node Vd( v ,u) £ Vd, 
(b) Ld(vd(v,u)) — v.rvec[u], a formula of the form V^(V,m')' 
and (c) there is an edge (v d ( v ,u),Vd(v',u')) £ E d if and only 
if X( v i , u i) appears in Ld(v d ( v: u))- In other words, the node 
set Vd of Gd is defined in terms of both in-nodes in the 
fragments of T and the states in the query automaton G q . 

Procedure evalDG r constructs the dependency graph Gd 
of RVset, and checks whether Vd(s,u s ) can reach v d ( u ,u') 
for some node u, where Ld(v u y) is true. One can verify 
that s matches u s iff there exists a node Vd( u ,u') £ Vd with 
Ld(v u ,u') = true, and v d (s,u s ) reaches v d ( u ,u')- 

Example 8: Consider again query q rr (Ann, Mark, R) posed 
on the graph G of Fig. 1. The vector sets Fi.rvset are com- 
puted in parallel in all fragments Fi, as described in Ex- 




Mark 



Gq(R),R=(DB*U HR*| ' " ^/VriWark. Mark), true Gd i 

Figure 8: Assembling with dependency graph 

ample 7. Upon receiving Fi.rvset from all the sites, pro- 
cedure evalDG r first builds a dependency graph Gd based 
on the vector sets, as shown in Fig 8. Each node, e.g., 
Vd(Ann, Ann) is shown together with its label, e.g., A^m^hr). 
It then checks whether node Vd(Ann, Ann) reaches a node 
with label true, which is node t>d(Ross, HR) here. It returns 
true as the query answer, as there is a path (Ann, Mat, Fred, 
Emmy, Ross, Mark) satisfying the regular expression 7?. □ 

Correctness and complexity. One can readily verify the 
following. (1) The algorithm disRPQ always terminates. (2) 
Given a query q r r(s, t, R) and a fragmentation T of graph G, 
algorithm disRPQ returns true iff there exists a path p from 
s to t in G such that p satisfies R. To complete the proof of 
Theorem 3, observe the following about its complexity. 

The number of visits. Each site is visited only once, when 
the query automaton is posted by the coordinator site. 

Total network traffic. The communication cost includes the 
following: (1) 0(|G g |card(F)) for sending query automaton 
G q (R) to each site, where card(F) is the number of frag- 
ments, and \G q \ is in 0(\R\); and (2) O^R^^.^ \Fi.O\) for 
sending partial answers from each fragment Fi to the coordi- 
nator site. Putting these together, the total network traffic 
is in 0(|i?] 2 |V/| 2 ), where Vf is the total number of virtual 
nodes, since the number card(_F) of fragments and query size 
\R\ are much smaller than \Vf \ in practice. Note that the 
communication cost is independent of the entire graph G. 

Total computation. It takes 0(|i?| 2 * \F m \) time to compute 
the vector set in each fragment, in parallel, where \F m \ is the 
size of the largest fragment F m in T ' . To see this, observe 
that at each node v, it takes at most 0(\C(v, F m )\ * |i?| 2 ) 
time to construct its vector, for each child of v in C(v, F m ). 
Moreover, each node is visited once and its vector is com- 
puted once. Thus, in total it takes at most 0(|_F m | |ii| 2 ) time 
to compute all the vectors. The assembling phase takes up 
to O ( I J2| 2 1 I ) 2 ) time. Taking these together, the total com- 
putation time is in 0(|F m ||_R| 2 + \R\ 2 \V f \) 2 ). 

6. DISTRIBUTED REACHABILITY WITH 
MAPREDUCE 

We next present a simple MapReduce algorithm to eval- 
uate regular reachability queries. This algorithm just aims 
to demonstrate how easy to support our techniques in the 
MapReduce framework. More advanced MapReduce algo- 
rithms can be readily developed based on partial evaluation. 

MapReduce [7] is a software framework to support dis- 
tributed computing on large datasets with a large number 
of computers (nodes). (1) The data are partitioned into 
a collection of key/value pairs. Each pair is assigned to a 
node (mapper) identified by its key. (2) Each mapper pro- 
cesses its key/value pairs, and generates a set of intermediate 
key /value pairs, by using a Map function. These pairs are 
hash-partitioned based on the key. Each partition is sent 
to a node (reducer) identified by the key. (3) Each reducer 
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coordinator 




Figure 9: Processing path of algorithm reduceRPQ 

produces key/value pairs via a Reduce function, and writes 
them to a distributed file system as the result [7]. 

Our MapReduce algorithm, MRdRPQ, is illustrated in 
Fig. 9 and given in Fig. 10. It evaluates q rr (s, t, R) on graph 
G using procedures preMRPQ, mapRPQ and reduceRPQ. We 
next present the three procedures in details. 

Procedure preMRPQ. A coordinator first generates the query 
automaton G q (R) of q rr (s,t,R) (line 1; see Section 5). The 
graph G is then partitioned into K fragments (line 2) using 
some strategy parG, where K is the number of mappers. 
Each fragment Fi is represented as a key/value pair, where 
the key is i £ [1,-K], and its value is a pair <Fi, G q (R)> 
(lines 3-4). It is sent to mapper Mi along with G q (R) (line 5). 

Graph partitioning is conducted implicitly by MapReduce 
implementation (e.g., Hadoop), provided the number K of 
mappers and the average size of fragments (line 2). To 

explore the maximum parallelism we want the fragments to 
be of equal size; hence \^-} ■ One may also want to minimize 
^2 F eF \Fi.I\\Fi.O\, where F{.I (resp. Fi.O) is the set of in- 
nodes (resp. virtual nodes) of fragment Fi. However, this 
partition problem is intractable [10]. In our implementation 
we used Hadoop's default partitioning strategy. 

Procedure mapRPQ at each mapper. Upon receiving a pair 
<i, (Fi,G q (R))>, procedure mapRPQ is triggered at map- 
per Mi, in parallel. It simply uses procedure localEval r of 
Fig. 7 as its Map function, and computes a key/value pair 
<l,rvset;> (line 1), where rvseti is the vector set as de- 
scribed in Section 5. It sends the pair to a reducer R . Note 
that pairs from all the mappers are sent to the same reducer. 

Procedure reduceRPQ at the reducer R . After collecting the 
key/value pairs from all the mappers, the reducer puts these 
pairs in a set RVset (lines 1-3). It then invokes the assem- 
bling procedure evalDGd (see Section 5) as the Reduce func- 
tion to compute the answer ans to q rr in G (line 4), and 
writes a pair <0, ans> to the distributed file system (line 5). 

Correctness and complexity. The correctness of algo- 
rithm MRdRPQ immediately follows from the correctness of 
algorithm disRPQ (see Section 5). Following [1], we analyze 
the performance of MRdRPQ using the elapsed communica- 
tion cost ECC (data volume cost), which measures the total 
time cost of (parallel) data shipment. We define a process 
path P of MRdRPQ to be a path from the coordinator to 
the reducer, passing a single mapper (see Fig. 9). The cost 
of a process path a is the sum of the size of input data 
shipped to the nodes on a, following an edge of a. The ECC 
of MRdRPQ is the maximum cost over all process paths. 

The ECC analysis unifies the time and network traffic costs 
of a MapReduce algorithm. It does not count the in-memory 
computation cost of the Map and Reduce functions. Never- 
theless, (1) any indexes and compression techniques devel- 
oped for centralized graph query evaluation can be adopted 



Procedure preMRPQ 

Input: Graph G, regular reachability query q rr (s,t, R), integer K. 
Output: Lists of key/value pairs to be sent to mappers. 

1. construct query automaton G q (R) ;/*executed at coordinator*/ 

2. glist := parG(G, K, \^-~\); /* graph partition */ 

3. for each fragment Fi £ glist (i £ [1,14"]) do 

4. pair L : = <i, (Fi,G q (R))>; 

5. send L and G q (R) to mapper i; 

Procedure mapRPQ /* executed at each mapper */ 
Input: A key/value pair L = <i, (Fi,G q (R))>. 
Output: A key/value pair rdpair. 

1. rvsetj := localEval r (F;, G q {R)); 

2. send localEval r (_Fi, G q (R)) to a reducer; 

Procedure reduceRPQ /* executed at a single reducer */ 

Input: A list of key/value pairs. 

Output: The Boolean value ans to q rr in G. 

1. set RVset := 0; 

2. for each pair <1, rvset;> in rdlist do 

3. RVset: = RVset U rvsetj ; 

4. ans:= evalDG r (RVset); 

5. return <0, ans >; 

Figure 10: Algorithm MRdRPQ 

by mappers, as remarked earlier, (2) further MapReduce 
steps can be used to implement both Map and Reduce func- 
tions, and (3) network traffic dominates the total computa- 
tion time for real-life large graphs [1]. 

For algorithm MRdRPQ, one can verify the following. (1) 
The input size of each mapper is bounded by 0(|.F m |), where 
F m is the largest fragment returned by parG. (2) The input 
size of the reducer is bounded by 0(|i?| 2 |V/| 2 ), where Vf is 
the set of nodes in the fragment graph Gj. Putting these 
together, the ECC of mapRPQ is 0(\F m \ + \R\ 2 \ V f \ 2 ). 

7. EXPERIMENTAL EVALUATION 

We next present an experimental study of our distributed 
algorithms. Using real-life and synthetic data, we conducted 
four sets of experiments to evaluate the efficiency and com- 
munication costs of algorithms disReach (Section 3), disDist 
(Section 4), disRPQ (Section 5) and the MapReduce algo- 
rithm MRdRPQ (Section 6) on Amazon EC2. 

Experimental setting. We used the following data. 

(1) Real-life graphs. For (bounded) reachability queries, we 

used the following 4 : (a) a social network LiveJournal, (b) 
a communication network WikiTalk, (c) two Web graphs 
BerkStan and NotreDame, and (d) a product co-purchasing 
network Amazon. The sizes of these graphs are shown below. 



dataset 


|V| 


\E\ 


LiveJournal 


2.541,032 


20.000,001 


WikiTalk 


2,394,385 


5,021,410 


BerkStan 


685,230 


7,600,595 


NotreDame 


325,729 


1,497,134 


Amazon 


262,111 


1,234,877 



For regular reachability queries, we used the following 
graphs with attributes on the nodes: (a) Citation 5 , in which 
nodes represent papers with id and venue, and edges denote 
citations, (b) MEME 5 , a blog network in which nodes are 
Web pages and edges are links, (c) Youtube 6 , a social net- 
work in which each node is a video with attributes (e.g., 

4 http://snap.stanford.edu/data/index.html 
5 http:// www. arnetminer. org/ citation / 
6 http://netsg.cs.sfu.ca/youtubedata/ 
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category), and each edge indicates a recommendation, and 
(d) Internet 7 , where each node is a system labeled with its id 
and location, and each edge represents internet connection. 
The datasets are summarized below, where \L\ is the size of 
node label set, and card(F') is the number of the fragments 
generated for regular reachability queries (see below). 



dataset 


\v\ 


\E\ 


\L\ 


card(F) 


Citation 


1,572,278 


2,084,019 


6300 


10 


MEME 


700,000 


800,000 


61065 


11 


Youtube 


234,452 


454,942 


12 


12 


Internet 


57,971 


103,485 


256 


10 



(2) Synthetic data. We designed a generator to produce 
large graphs, controlled by the number \V\ of nodes, the 
number \E\ of edges, and the size \L\ of node labels. 

(3) Graph fragmentation. We randomly partitioned real-life 
and synthetic graphs G into a set F of fragments, controlled 
by card(F) and the average size of the fragments in F (the 
sum of the numbers of nodes and edges), denoted by size(F). 
Unless stated otherwise, size(F) = |G|/card(F). 

(4) Query generator. We randomly generated (a) reachabil- 
ity queries, (b) bounded reachability queries with bound I, 
and (c) regular reachability queries from a set L of labels. 

(5) Algorithms. We implemented the following algorithms 
in Java: (A) disReach, disReach n and disReach m for reach- 
ability queries, where (a) disReachn ships all the fragments 
to a coordinator in parallel, which calls a centralized BFS 
algorithm to evaluate the query [31]; and (b) disReach m , 
a message-passing based distributed BFS algorithm follow- 
ing [21] (see details below); (B) disDist and disDistn for 
bounded reachability queries, where disDist n is similar to 
disReach n ; (C) disRPQ, disRPQ„ and disRPQ d for regular 
reachability queries, where disRPQ„ is similar to disReachn, 
and disRPQd is a variant of the algorithm of [30] (see Sec- 
tion 1); and (D) the MapReduce algorithm MRdRPQ. 

Following [21], algorithm disReach m assigns a worker Si 
for each fragment Fi, and a master S c that maintains the 
fragment graph (see Section 2). (i) Each node v in the frag- 
ments has a status l(v) G {inactive, active}, initially inactive, 
(ii) A message "T" can be sent only from active nodes vi 
(i.e., l(vi) = active) to their inactive children V2 (i.e., I(v2) 
— inactive), which then become active, (iii) no active node 
can become inactive again, (iv) Si can send "T", "idle", or a 
virtual node of Fi as a message to S c . 

Upon receiving a reachability query q r (s,t), S c posts q r 
to all the workers Si. For the fragment Fi that contains 
the node s specified in q r (s,t), its worker Si changes l(s) 
to active, and sends a message "T" to its immediate inac- 
tive children, which in turn propagate "T" following a BFS 
traversal to inactive nodes. During the propagation, (i) if 
"T" reaches an inactive virtual node v, Si sends a message 
v to S c , which redirects the message to workers Sj where 
the fragments Fj has inactive in-node v; Sj then makes v 
active, and propagates "T" along the same lines in Fj- (ii) 
if "T" reaches the node t in q r (s, i), Si sends message "T" 
to S c , and algorithm disReachm returns true, indicating that 
q r (s,t) = true; and (iii) when no message is propagating 
in Si, it sends message "idle" to S c . Algorithm disReach m 
returns false if all the workers send "idle" to it. 

Machines . We deployed these algorithms on Amazon EC2 
High-Memory Double Extra Large instances 8 . Each site 

7 http:/ / www. caida. org/ data/ 
8 http:// aws. amazon . com / ec2/ 





Time (second) 


Trafflc(MB) 


disReach 


disReach n 


disReach m 


disReach 


disReach n 


disReach m 


Livejournal 


12.03 


27.52 


186.55 


174 


1800 


27 


WikiTalk 


3.32 


9.95 


41.42 


80 


726 


19 


BerkStan 


3.25 


8.51 


40.31 


29 


555 


11 


NotreDame 


0.83 


3.77 


13.32 


14 


147 


7 


Amazon 


0.55 


2.55 


7.86 


10 


120 


5 



Table 2: Efficiency and data shipment: real life data 

stored a fragment. Each experiment was run 5 times and 
the average is reported here. 



Experimental results. We next present our findings. 

Exp-1: Efficiency and scalability of disReach. 

Efficiency. We first evaluated the efficiency of disReach, 
disReachn and disReach m . Fixing card(F) = 4, we randomly 
generated 100 reachability queries (where around 30% re- 
turn "true"), and report the average evaluation time and the 
network traffic in Table 2. The results show that disReach 
is far more efficient than disReachn and disReach m . For ex- 
ample, on Amazon, disReach takes only 20% of the running 
time of disReachn, and 6% of that of disReach m . On the real 
datasets it takes 4 seconds in average. 

For the network traffic of disReach m , we counted the total 
number of messages sent between the workers and the mas- 
ter. Table 2 shows that in average, the network traffic of 
disReach is only 9% of that of disReach n (i.e., the size of the 
original graphs), but is not as good as that of disReachm. In- 
deed, the data shipment of disReachm is linear in the number 
of the total virtual nodes. However, this reduction comes at 
the cost of serializing operations that can be conducted in 
parallel, as indicated by its extra running time (Table 2). 
Moreover, it has no bound on the number of visits to each 
site; for instance, when card(F) = 4 on Amazon, the four 
sites were visited about 2500 times in total. 

Scalability. To evaluate the scalability with card(F), we used 
Livejournal as the dataset and varied card(F) from 2 to 
20. We used the same set of queries as above. Fig. 11(a) 
shows that the larger card(F) is, the less time disReach and 
disReachn take. For disReach, this is because partial evalu- 
ation of localEval takes less time on smaller fragments. For 
disReachn, while the evaluation time on the restored graph 
remains stable (about 10 seconds), it takes less time to ship 
each fragment to the coordinator when card(F) increases. In 
contrast, the larger card(i ? ) is, the more costly disReachm is. 
Indeed, smaller fragments require more frequent visits and 
thus, more communication cost. 

To evaluate the scalability with the average size(F) of 
fragments, we generated synthetic graphs following the den- 
sification law [20], by fixing card(F) = 8 and varying the size 
of the graphs from 280K to 2.52M. As shown in Fig. 11(b), 
when s\ze(F) is increased, so is the running time of all these 
algorithms, as expected. Nonetheless, disReach scales well 
with size(F), and is less sensitive to size(F) than the others. 

We also tested disReach and disReachm over a larger syn- 
thetic graph, which has 36M nodes and 360M edges. We 
varied card(i ? ) from 10 to 20 in 2 increments. The results, 
shown in Fig 11(c), tell us the following. (1) disReach scales 
well with card(F), and takes less time over larger card(F), 
and (2) disReach m takes more time when card(i ? ) gets larger. 
The results are consistent with the observation of Fig 11(a). 

Exp-2: Efficiency of disDist. This set of experiments 
evaluated the performance of disDist and disDist n . Us- 
ing WikiTalk, we varied card(F) from 2 to 20, and ran- 
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dornly generated 100 bounded reachability queries with 
Z=10. Fig. 11(d) shows that (1) disDist outperforms disDist n 
by 62.5% in average, and (2) disDist and disDist n take less 
time over larger card(F), for the same reason as given above. 

The performance of disDist and disDist n (not shown) are 
consistent with their counterparts (disReach and disReach n ). 

Exp-3: Efficiency and scalability of disRPQ. 

Efficiency. The third set of experiments focused on the per- 
formance of algorithms disRPQ, disRPQn and disRPQd [30], 
for regular reachability queries. We specify the complexity 
of such a query in terms of (|V 9 |, \E q \, \L q \), where V q ,E q 
and L q are the sets of states, transitions and node labels in 
its query automaton, respectively (see Section 5.1). 

We first evaluated the response time and network traffic 
of these algorithms on the four real-life datasets described 
earlier, with \V\, \E\, \L\ and card(F) given there. We gen- 
erated 30 regular reachability queries with {\V q \ = 8, \E q \ = 
16, \L q \ = 8), and report their average time (resp. network 
traffic) in Fig. 11(e) (resp. Fig 11(f)). We find the follow- 
ing: (1) disRPQ is more efficient than disRPQ n and disRPQd; 
indeed, the running time of disRPQ is 61.8%, 88%, 64.8% 
and 56.6% of that of disRPQd on Youtube, MEME, Citation 
and Internet, respectively; and (2) disRPQ incurs less net- 
work traffic than the other algorithms: at most 25% of data 
shipped by disRPQd and 3% of that of disRPQ n in average. 

To evaluate the impact of query complexity, we used 
Youtube and generated 40 regular reachability queries by 
varying \V q \ from 4 to 18 and \E q \ from 8 to 36, while fixing 
\L q \ = 8. Fig. 11(g) shows that (1) all the algorithms take 
longer to answer larger queries, and (2) disRPQ and disRPQd 
are less sensitive to the size of queries than disRPQ„. 

Scalability. We generated synthetic graphs by fixing card(F) 
= 10 while varying the size of the graphs from 350K to 



3.15M. We tested 30 queries with \V q \ = 8, \E q \ = 16 and 
\L q \ = 8, and report the average running time in Fig. 11(h). 
The result shows that disRPQ scales well with size(F), and 
performs better than disRPQd and disRPQ n . Moreover, it is 
efficient: disRPQ takes 16 seconds on graphs with 1.5M (mil- 
lion) nodes and 2.1M edges. In addition, the larger size(F) 
is, the longer the three algorithms take, as expected. 

To evaluate the scalability card(F), we generated graphs 
with 1.2M nodes and 4.8M edges, and varied card(F) from 6 
to 20. As shown in Fig. 1 1 (i) , the larger card(F) is, the less 
time disRPQ takes, since it conducts partial evaluation on 
smaller fragments by exploring parallel computation. This 
confirms our complexity analysis for disRPQ (Section 5). In- 
deed, the time taken by disRPQ when card(F) = 6 is reduced 
by 75% when card(F) = 20. Similarly, disRPQd and disRPQ„ 
take less time when card(F) is increased. 

In addition, we evaluated the scalability of disRPQ and 
disRPQd over large synthetic graphs. Fixing \V\ = 36M, \E\ 
= 360M and \L\ = 50, we varied card(F) from 10 to 20 in 2 
increments. As shown in Fig ll(j), (1) both algorithms scale 
well with card(F), and take less time when card(F) increases; 
and (2) disRPQ consistently outperforms disRPQd. 

Exp-4: Efficiency of MRdRPQ. Finally, we evaluated the 
efficiency and scalability of MRdRPQ, implemented using 
Hadoop (http://hadoop.apache.org), and deployed on Ama- 
zon EC2, where each instance serves as a mapper. We use 
Youtube and four sets of q rr Qi, Q2, Q3,Qa of different com- 
plexities (4, 6, 8), (6, 8, 8), (10, 12, 8), (12, 14, 8), respectively. 

To evaluate the scalability of MRdRPQ, we fixed the num- 
ber of mappers as 10, and varied the graph size from 350K 
to 3.15M. As shown in Fig. ll(k), MRdRPQ scales well with 
size(F). Moreover, the larger size(F) is or the more complex 
a query is, the longer time MRdRPQ takes, as expected. To 
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evaluate its scalability with the number \M\ of mappers, we 
varied \M\ from 5 to 30. As shown in Fig. 11(1), it takes 
less time of MRdRPQ to evaluate queries with more map- 
pers. Indeed, the time taken by MRdRPQ using 5 mappers 
is reduced by 50% when 30 mappers are used for Q\. 

We also find that disRPQ takes 17.4% of the running time 
of MRdRPQ and 3.7% of its network traffic on Youtube. The 
extra cost of MRdRPQ is incurred in the Map phase of the 
MapReduce framework, for distributing data to mappers. 

Summary. From the experimental results we find the fol- 
lowing. (1) All of our algorithms scale well with the size 
of graphs, the number of fragments, and the complexity 
of queries (for disRPQ and MRdRPQ). (2) Our algorithms 
are efficient even on randomly partitioned graphs. For in- 
stance, (a) disReach takes 20% and 6% of the running time 
of disReachn and disReach m over Amazon, and takes in av- 
erage 4 seconds over all real life datasets; and (b) disRPQ 
takes 67.8% and 46% of the time of disRPQ d [30], and ships 
47.9% and 45.9% of the data sent by disRPQd, on real-life 
and synthetic graphs in average, respectively. Overall our 
algorithms ship no more than 11% of the entire graphs in 
average. (3) Partial evaluation works well in the MapReduce 
model, as verified by the performance of MRdRPQ. 

8. CONCLUSION 

We have provided algorithms for evaluating a group of 
reachability queries on distributed graphs based on partial 
evaluation, possess performance guarantees on the number 
of visits to each site, the total network traffic, and on the 
response time. Moreover, they are generic: no constraints 
is posed on how the graphs are partitioned and distributed. 
We have also shown that partial evaluation can be naturally 
conducted as MapReduce. Our experimental study has ver- 
ified the scalability and efficiency of our methods. We con- 
clude that partial evaluation provides a promising approach 
to distributed graph query evaluation. 

We are currently developing distributed evaluation 
(MapReduce) algorithms for other queries, notably graph 
pattern matching, over larger real-life graphs. Another topic 
is to combine partial evaluation and incremental computa- 
tion, to provide efficient distributed graph query evaluation 
strategies in the dynamic world. 
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